WebAnnotator, an Annotation Tool for Web Pages

نویسنده

  • Xavier Tannier
چکیده

This article presents WebAnnotator, a new tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension, allowing annotation of both offline and inline pages. The HTML rendering fully preserved and all annotations consist in new HTML spans with specific styles. WebAnnotator provides an easy and general-purpose framework and is made available under CeCILL free license (close to GNU GPL), so that use and further contributions are made simple. All parts of an HTML document can be annotated: text, images, videos, tables, menus, etc. The annotations are created by simply selecting a part of the document and clicking on the relevant type and subtypes. The annotated elements are then highlighted in a specific color. Annotation schemas can be defined by the user by creating a simple DTD representing the types and subtypes that must be highlighted. Finally, annotations can be saved (HTML with highlighted parts of documents) or exported (in a machine-readable format).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating Custom Taggers by Integrating Web Page Annotation and Machine Learning

We present an on-going work on a software package that integrates discriminative machine learning with the open source WebAnnotator system of Tannier (2012). The WebAnnotator system allows users to annotate web pages within their browser with custom tag sets. Meanwhile, we integrate the WebAnnotator system with a machine learning package which enables automatic tagging of new web pages. We hope...

متن کامل

A manually annotated HTML corpus for a novel scientific trend analysis

Here we present a manually annotated corpus of web pages and annotation tool for Web Content Mining. The corpus is extensively annotated, has a hierarchical label structure and is freely available for research purposes. The annotation tool is a Firefox extension which allows the annotator to work with the pages in their original appearance. This tool handles the annotation hierarchy independent...

متن کامل

On-the Fly Annotation of Dynamic Web Pages

The annotation of web pages is a critical task for the success of the semantic web. While many tools exist to facilitate the annotation of static web pages, annotation of dynamically generated ones has not been sufficiently addressed. This paper addresses the task of annotating web pages whose dynamic content is derived from a database. The approach adopted is based on annotating a database sch...

متن کامل

BOEMIE Ontology-Based Text Annotation Tool

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be signific...

متن کامل

Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation

The internet has become an attractive avenue for global e-business, e-learning, knowledge sharing, etc. Due to continuous increase in the volume of web content, it is not practically possible for a user to extract information by browsing and integrating data from a huge amount of web sources retrieved by the existing search engines. The semantic web technology enables advancement in information...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012